uffd: add support for UFFD_EVENT_REMOVE events by bchalios · Pull Request #1896 · e2b-dev/infra

bchalios · 2026-02-11T21:13:23Z

Enabling Firecracker free-page-reporting feature requires us to handle remove events (UFFD_EVENT_REMOVE) in our userfaultfd handler. These events are triggered whenever Firecracker calls madvise(MADV_DONTNEED) (or similar) on a range of guest memory addresses.

The main thing that changes on our logic is that page faults in a page that has previously been removed need to be served with a zero page rather than a page from the snapshot file.

This commit changes the page fault serving logic to:

Introduce tracking of the state of every page in the guest's memory mappings.
Add logic to handle the new UFFD_EVENT_REMOVE event
Modify existing logic to take into account current state when deciding how to handle each page fault

This is dependent on the part of #1858 that enables free page reporting on the Firecracker side.

Note

High Risk
High risk because it changes low-level userfaultfd event handling and page population semantics (including new ioctls and concurrency), which can affect sandbox stability, memory correctness, and performance under load.

Overview
Adds support for Firecracker’s free-page-reporting flow by upgrading to Firecracker v1.14.1 and plumbing an optional freePageReporting setting from API/template config through the orchestrator into the FC boot sequence (enabling the balloon device when allowed by version + feature flag/CLI). The userfaultfd handler is refactored to read and process UFFD_EVENT_REMOVE alongside pagefaults, track per-page state, and serve faults for removed pages as zero-filled data (with new ZEROPAGE/WRITEPROTECT/WAKE ioctls plus EAGAIN deferral), with expanded cross-process tests covering remove ordering, gating, and “always write-protect” assumptions.

^{Written by Cursor Bugbot for commit 91075ab. This will update automatically on new commits. Configure here.}

packages/orchestrator/internal/sandbox/uffd/memory/mapping.go

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

packages/orchestrator/internal/sandbox/uffd/memory/region.go

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

packages/orchestrator/internal/sandbox/fc/process.go

packages/shared/pkg/fc/firecracker.yml

packages/orchestrator/internal/sandbox/uffd/memory/region.go

As we're going to handle UFFD_EVENT_REMOVE events triggerred by Firecracker, we need to keep track of the state of all the guest memory pages. Theis commit introduces 3 states: * Unfaulted - A page that has not been faulted yet. * Faulted - A page that we have previously faulted in. * Removed - A page that we have received a remove event for and haven't faulted in since. It also adds the necessary book keeping of page state in all the memory regions of the guest, along with methods for retrieving and setting the state of pages. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Import a few more bindings that we'll need for handling remove events. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Handle UFFD_EVENT_REMOVE events from the file descriptor. These events are triggerred when Firecracker calls madvise() with MADV_DONTNEED on some memory range that we are tracking. This Firecracker behaviour is support with version 1.14.0 onwards using the free page reporting and hinting features of balloon devices. What this means for us is that, we need to track removed pages because subsequent page faults need to be served with a zero page. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Balloon devices provide memory reclamation facilities through free page reporting, as a more efficient mechanism (in terms of latency and CPU time) than traditional ballooning. Free page reporting instructs the guest to periodically report memory that has been freed, so that we can reclaim it in the host side. It is enabled before starting the sandbox and does not require any further host-side orchestration. Enable free page reporting for all new templates using Firecracker versions >= v1.14.0. Also, allow users to optionally disable it for these versions. Older Firecracker versions don't support the feature. Trying to enable it for those, will return an error. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Do that, so we maintain the dirty page tracking on. Prefaulting does not occur due to write faults, so tracking must be maintained. Writing will clear the bit asynchronously in the kernel. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

userfaultfd_zeropage() does not understand UFFD_COPY_MODE_WP. When zeroing a page we should always be passing 0 in mode (since we always want to unblock the faulting thread). For userfaultfd_copy() we want to provide UFFD_COPY_MODE_WP only when we are copying due to a read. This includes read-triggerred page faults from Firecracker and us pre-faulting. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

During pre-faulting we always recorded an error to the span when `faultPage` returned `!handled` for a block, but not all such cases are actually errors. EEXIST errors means that the page has been faulted already, which can potentially happen since we are pre-faulting and handling on-demand page faults at the same time. EAGAIN happens when a UFFD_EVENT_REMOVE event landed in the UFFD while were trying to handle a page fault. Both those cases are fine from the prefaulter's point of view. In the former, there's nothing to do, on-demand handler won the race. In the latter, we need to let the on-demand page fault handler handle the remove event before we proceed. Only record an error when err != nil. Otherwise, add an event. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

packages/orchestrator/internal/sandbox/uffd/userfaultfd/fd.go

packages/orchestrator/internal/sandbox/uffd/userfaultfd/cross_process_helpers_test.go

When guest memory is backed by 4K pages, Firecracker backs it with anonymous pages. Anonymous pages can only be write-protected when there is page entry present. This means that until we receive a fault for it a page is not write-protected. This is fine. Not present (not faulted) pages cannot be dirty, by definition. When handling page faults for known pages, we use UFFDIO_COPY with write-protection mode, which automatically sets write-protection for the faulted page. However, imagine this sequence: 1. We resume from a snapshot 2. We receive a UFFD_EVENT_REMOVE for a range that was not previously faulted in. 3. We receive a page fault for a page within the removed region. The correct way to handle this is providing the zero page and that's what we do. However, UFFDIO_ZERO does not have a write-protected mode, so the newly setup page is not write-protected tracked. Such a page will always be reported dirty from Firecracker (present and !write-protected) and it will be needlessly included in the snapshot. Handle this by explicitly marking such pages as write-protected. Handle the race condition by providing the zero page without waking up the thread, marking it write protected and then waking up the thread. Note, that huge pages don't have this issue because: 1. Hugetlbfs-backed pages are write-protected by Firecracker 2. we always handle huge page faults using UFFDIO_COPY with write-protection mode. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

packages/orchestrator/internal/sandbox/uffd/userfaultfd/cross_process_helpers_test.go

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix prepared fixes for both issues found in the latest run.

✅ Fixed: Race between goroutine faulted state and remove state
- I added per-page versioning in pageTracker and changed async fault handlers to mark pages faulted only when the version is unchanged, so stale goroutines can no longer overwrite newer remove state.
✅ Fixed: Prefault omits pageTracker and prefetchTracker updates
- Prefault now records successful prefaults by updating pageTracker state conditionally and adding the offset/access to prefetchTracker under the settleRequests guard.

Or push these changes by commenting:

@cursor push 508bddf9b7

Preview (508bddf9b7)

diff --git a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/page_tracker.go b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/page_tracker.go
--- a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/page_tracker.go
+++ b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/page_tracker.go
@@ -13,27 +13,35 @@
 type pageTracker struct {
 	pageSize uintptr
 
-	m  map[uintptr]pageState
-	mu sync.RWMutex
+	m        map[uintptr]pageState
+	versions map[uintptr]uint64
+	mu       sync.RWMutex
 }
 
 func newPageTracker(pageSize uintptr) pageTracker {
 	return pageTracker{
 		pageSize: pageSize,
 		m:        make(map[uintptr]pageState),
+		versions: make(map[uintptr]uint64),
 	}
 }
 
 func (pt *pageTracker) get(addr uintptr) pageState {
+	state, _ := pt.getWithVersion(addr)
+
+	return state
+}
+
+func (pt *pageTracker) getWithVersion(addr uintptr) (pageState, uint64) {
 	pt.mu.RLock()
 	defer pt.mu.RUnlock()
 
 	state, ok := pt.m[addr]
 	if !ok {
-		return unfaulted
+		return unfaulted, pt.versions[addr]
 	}
 
-	return state
+	return state, pt.versions[addr]
 }
 
 func (pt *pageTracker) setState(start, end uintptr, state pageState) {
@@ -42,5 +50,20 @@
 
 	for addr := start; addr < end; addr += pt.pageSize {
 		pt.m[addr] = state
+		pt.versions[addr]++
 	}
 }
+
+func (pt *pageTracker) setStateIfVersion(addr uintptr, state pageState, version uint64) bool {
+	pt.mu.Lock()
+	defer pt.mu.Unlock()
+
+	if pt.versions[addr] != version {
+		return false
+	}
+
+	pt.m[addr] = state
+	pt.versions[addr]++
+
+	return true
+}

diff --git a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go
--- a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go
+++ b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go
@@ -23,6 +23,8 @@
 		return fmt.Errorf("data length (%d) does not match pagesize (%d)", len(data), u.pageSize)
 	}
 
+	_, version := u.pageTracker.getWithVersion(addr)
+
 	// We're treating prefault handling as if it was caused by a read access.
 	// This way, we will fault the page with UFFD_COPY_MODE_WP which will preserve
 	// the WP bit for the page. This works even in the case of a race with a
@@ -33,6 +35,9 @@
 	//
 	// In both cases, the WP bit will be cleared because it is handled asynchronously
 	// by the kernel.
+	u.settleRequests.RLock()
+	defer u.settleRequests.RUnlock()
+
 	handled, err := u.faultPage(
 		ctx,
 		addr,
@@ -49,8 +54,13 @@
 
 	if !handled {
 		span.AddEvent("prefault: page already faulted or write returned EAGAIN")
+
+		return nil
 	}
 
+	u.pageTracker.setStateIfVersion(addr, faulted, version)
+	u.prefetchTracker.Add(offset, block.Read)
+
 	return nil
 }
 

diff --git a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
--- a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
+++ b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
@@ -276,7 +276,8 @@
 
 			var source block.Slicer
 
-			switch state := u.pageTracker.get(addr); state {
+			state, version := u.pageTracker.getWithVersion(addr)
+			switch state {
 			case faulted:
 				// Skip faulting the page. This has already been faulted, either during pre-faulting
 				// or because we handled another page fault on the same address in the current
@@ -319,7 +320,7 @@
 				}
 
 				if handled {
-					u.pageTracker.setState(addr, addr+u.pageSize, faulted)
+					u.pageTracker.setStateIfVersion(addr, faulted, version)
 					u.prefetchTracker.Add(offset, accessType)
 				} else {
 					deferred.push(pf)

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

cursor · 2026-03-10T20:57:11Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

+
+				if handled {
+					u.pageTracker.setState(addr, addr+u.pageSize, faulted)
+					u.prefetchTracker.Add(offset, accessType)


Race between goroutine faulted state and remove state

Medium Severity

A goroutine from iteration N can call pageTracker.setState(addr, faulted) after the main loop in iteration N+1 has already processed a UFFD_EVENT_REMOVE and set that address to removed. The goroutine's setState(faulted) overwrites the removed state, causing future page faults for that address to be skipped (the faulted case does continue). Since the page was MADV_DONTNEED'd, the guest may read stale snapshot data instead of a zero page.

Additional Locations (1)

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go#L252-L256

cursor · 2026-03-10T20:57:11Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go

+	if !handled {
+		span.AddEvent("prefault: page already faulted or write returned EAGAIN")
+	}
+


Prefault omits pageTracker and prefetchTracker updates

Medium Severity

Prefault calls faultPage but never updates pageTracker to faulted or calls prefetchTracker.Add on success. In the old code, faultPage internally called both missingRequests.Add and prefetchTracker.Add. Now those are the caller's responsibility (done in Serve), but Prefault was not updated. This means prefaulted pages are invisible to the tracker, causing every subsequent on-demand fault for a prefaulted page to attempt an unnecessary UFFDIO_COPY (getting EEXIST), and PrefetchData will not include prefaulted pages.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-11T04:51:54Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

+			break
+		}
+		// And, finally, wake up the faulting thread
+		writeErr = u.fd.wake(addr, u.pageSize)


Multi-step 4K zero-fill can leave faulting thread stuck

High Severity

The 4K zero-page path uses a three-step sequence: zero() with UFFDIO_ZEROPAGE_MODE_DONTWAKE, then writeProtect(), then wake(). If zero() succeeds but writeProtect() or wake() returns EAGAIN, the error falls through to the common EAGAIN handler which defers the fault. On retry, zero() returns EEXIST (page already mapped), which the common EEXIST handler treats as successfully handled — but writeProtect and wake are never retried. The faulting thread remains blocked forever because DONTWAKE was used and wake() never completed.

cursor · 2026-03-11T04:51:54Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go

+		span.AddEvent("prefault: page already faulted or write returned EAGAIN")
+	}
+
+	return nil


Prefault omits pageTracker update causing redundant faults

Medium Severity

Prefault calls faultPage but never updates pageTracker to faulted on success, unlike the Serve goroutine which calls pageTracker.setState. This means the on-demand handler in Serve won't see the page as faulted and will attempt to fault it again (getting EEXIST). More importantly, prefaulted pages also bypass prefetchTracker.Add, unlike the old code which called both trackers from within faultPage.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-11T09:29:43Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go

+		span.AddEvent("prefault: page already faulted or write returned EAGAIN")
+	}
+
+	return nil


Prefault no longer updates prefetchTracker causing data loss

Medium Severity

The refactored Prefault calls faultPage but never updates prefetchTracker (or pageTracker). Previously, faultPage internally called u.prefetchTracker.Add(offset, accessType). Now tracker updates only happen inside the Serve loop goroutine. Pages that are only prefaulted and never on-demand faulted (e.g., read-only kernel pages) silently disappear from PrefetchData(), degrading prefetch quality across template generations.

Additional Locations (1)

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go#L324-L327

e2b-request-same-site-reviewers bot assigned jakubno Feb 11, 2026

bchalios force-pushed the feat/free-page-reporting branch from aef7cd8 to eab27ea Compare February 11, 2026 21:13

claude bot reviewed Feb 11, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/uffd/memory/mapping.go Outdated Show resolved Hide resolved

claude bot reviewed Feb 11, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go Outdated Show resolved Hide resolved

claude bot reviewed Feb 11, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go Outdated Show resolved Hide resolved

claude bot reviewed Feb 11, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/uffd/memory/region.go Outdated Show resolved Hide resolved

bchalios force-pushed the feat/free-page-reporting branch 2 times, most recently from a1d0a02 to 03e2966 Compare February 11, 2026 21:20

jakubno assigned ValentaTomas Feb 11, 2026

bchalios force-pushed the feat/free-page-reporting branch 5 times, most recently from b766eb3 to 4d197e4 Compare February 12, 2026 23:48